TABLE OF CONTENT¶

1. Dataset Overview¶

2. Investigation Overview¶

3. Univariate Exploration¶

4. Bivariate Exploration¶

5. Multivariate Exploration¶

6. References¶

Dataset Overview¶

Prosper was founded in 2005 as the first peer-to-peer lending marketplace in the United States. Since then, Prosper has facilitated more than $22 billion in loans to more than 1,320,000 people.

Through Prosper, people can invest in each other in a way that is financially and socially rewarding.

Over 110,000 peer-to-peer loans issued on the lending platform Prosper made up the dataset, which includes more than 80 different factors. I decided to concentrate on approximately ten of these factors, so I fiddled with the variables I choose, removing ones that lacked data in the regions I was looking at. 39 outliers who claimed to earn more than $50,000 per month were also eliminated because they were skewing the statistics. For my study, I needed information on roughly 77 thousand loans. I divided the various compliance and delinquency levels into two categories: Compliant and Delinquent.

Investigation Overview¶

I intended to look at the characteristics of loans that could be utilized to forecast their borrower APR in this inquiry. The original loan amount, borrower's Prosper rating, loan term, stated monthly income, employment status, and occupation were the major considerations.

What is the structure of your dataset?¶

This cleaned data set includes details on 76224 loans for 14 different variables. The majority of the variables are numerical, while Loan Status is a nominal category variable.

What is/are the main feature(s) of interest in your dataset?¶

Original Loan Amount, Borrower Annual Percentage Rate (BorrowerAPR) and BorrowerRate. Predict what factors affect them.

What features in the dataset do you think will help support your investigation into your feature(s) of interest?¶

I predict that Monthly paycheck size of the borrower, the original amount of the loan requested, Employment status, and the kind of Occupation will affect the features of interest.

Now let's go over and start exploring!🕺🏾🕺🏾🕺🏾¶

Univariate Exploration of Some Selected Variables¶

Question 1¶

What is the distribution of the main variables of interest?

Observations¶

The distribution of BorrowerAPR and BorrowerRate is multimodal in nature

Observation¶

From the histogram chart below, 4k, 10k and 15k are the most borrowed amounts in Prosper loan app

Distribution of Stated Monthly Income of the borrowers¶

From the chart below, the stated monthly income is skewed to the right. This means that majority of Prospa client earn below 15k dollars per month¶

Most of the borrowers are employed as seen in the barchart below.¶

Question 2¶

What's the occupation type count of the borrowers

Observations¶

Top 5 most popular occupation of the borrowers are¶

Professional

Executive

Computer

Programmer

Teacher

Text(0.5, 1.0, 'Value Count of the Occupation Type')

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?¶

The Borrower APR is slightly multimodal and the values are between 0.05 to 0.4

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?¶

The distribution of stated monthly income is skewed to the right and 97% of the borrowers earn below 15k per month

Another observation shows that most of the borrowers are employed.

To achieve the goal of my analysis on the sub-dataset, I dropped all null values since they weren't much enough as to negatively bias the final result.

Bivariate Exploration¶

Question 1¶

Will higher loan amount attract lower BorrowerAPR? I predict it should, but don't bank on my assumption, let the data tell us graphically.

Observation¶

We observed a negative correlation between Loan Original Amount and Borrower APR, that means as I earlier predicted higher loan amounts had lower borrower annual percentage return

Observations¶

  1. 50% of the current loan is below 10k dollars and the highest is within 35k dollars range
  2. 75% of the loans that are defaulted fall within 5k - 10k dollars
  3. 75% of the loans that are past due (16-30 days) are below 15k
  4. The highest defaulted loan amount is 15k

Question 2¶

How does employment status perform across different Prosper Rating

Observation¶

1. Employment Status does not have enough data for Part-time, Retired, Self-employed and Not employed to show its interaction with ProsperRating (Alpha)

2. Most of the employed borrowers were C-rated followed by B and A respectively. Less than 5000 borrowers had AA-rating which is the highest.

Text(0.5, 1.0, 'Prosper Rating (Alpha) across EmploymentStatus')

Question 3¶

Does employment status influence the amount of loan requested

Observation¶

Borrowers that have Employed, Self-employed and Others employment status borrow higher amount than part-time, retired,full time and not-employed borrowers.

Text(0.5, 1.0, 'Relationship Between Loan Original Amount & Employment Status')

Question 5¶

Does any form of correlation exist between LoanOriginalAmount and BorrowerRate?

Observation¶

There is a negative correlation between the LoanOriginAmount and BorrowerRate.

Obviously, as I had expected that interest rate should be lesser for higher loan amount, the trendline of the scatter plot shows that the negative correlation.

Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')

Question 6¶

How does the BorrowerAPR compare to the loan Term?

Observation¶

36 months term loans have higher BorrowerAPR than 12 or 60 months term.

Text(0.5, 1.0, 'Relationship Between Term and BorrowerAPR')

Multivariate Exploration¶

My key interest here is to investigate how the relationship between LoanOriginalAmount and BorrrowerAPR is impacted by categorical variables like Term and Prosper Rating (Alpha).

As a bonus, I will also explore same impact on Loan Original Amount and BorrowerRate

Question 1¶

What is the impact of term on Loan Amount and Borrower APR using regplot. (*Bonus: Replace Borrower APR with BorrowerRate and observe if the trend is same with BorrowerAPR)

Observation¶

Generally, there is a negative correlation between LoanOriginalAmount and BorrowerRate for all 3 terms. Similar tren can be observed between LoanOriginalAmount and BorrowerAPR.

Text(0.5, 1.0, 'Correlation Between BorrowerAPR and Loan Original Amount')
Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')

Question 2¶

How does Prosper Rating affect the relation between Borrower APR (Annual Percentage Rating) and Loan Original Amount

Visualisation¶

Observation¶

The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR. I believe that Prosper executives purposefully increased the APR for high-rated customers as the loan amount requested increased in order to maximise returns from the transaction (possibly because these customers have been with them for a long time and they are already loyal to the brand). In contrast, those with lower prosper ratings have lower APRs as the loan amount increases. I think this is being done on purpose to entice new clients—who most likely have low APRs—to try out the service.

Text(0.5, 1.0, 'Correlation Between BorrowerAPR and Loan Original Amount')

Question 3¶

How does Prosper Rating affect the relation between Borrower Rate (Interest Rate) and Loan Original Amount

Observation¶

Similar conclusion can be drawn for this relationship between Loan Original Amount and Borrower Rate as in that of Loan Original Amount vs BorrowerAPR above.

Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')

Question 4¶

Using Seaborn pointplot, can we see how loan term affect the relationship between ProspersRating and BorrowerAPR

Observation¶

Highly rated borrowers (AA-B) have lower APR, though there is an incremental difference as the loan term increases from 12-60. But poorly rated borrowers attract higher APR.

Question 6¶

Visualise the impact of Term on the relationship between ProsperRating (Alpha) & LoanOriginalAmount; and ProsperRating (Alpha) & StatedMonthlyIncome

Observation¶

Borrowers with high monthly income and prosper rating tend to borrow loans of 12 months term duration.

Text(0, 0.5, 'StatedMonthlyIncome')

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?¶

The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR.

Further exploration on the influence of loan term and prosper rating on the original loan amount shows that for better rating, the amount increases for all three terms.

Were there any interesting or surprising interactions between features?¶

Unexpectedly, the borrower APR and loan amount have a negative link when the borrower's Prosper rating is between HR and B, but a positive correlation when the borrower's rating is between A and AA. Another intriguing finding is that for borrowers with HR-C rates, the borrower APR decreases as the borrow time lengthens. However, the APR rises with the length of the loan for those with B-AA credit ratings.

References¶

1. Pmalo46

2. Amyra Fathy

3. Imsingla

4. Stackoverflow

5. Types Categorical

6. Point Plot